Collection ergonomics #990

Yoshanuikabundi · 2024-06-12T11:30:14Z

Description

Implements easier indexing into collections and dictionaries of potentials. Closes #968.

Works by defining equality between certain TopologyKey subclasses and tuples of atom indices when non-atom index attributes are None. Also implements __getitem__ on Collection to index directly from atom indices to potential.

>>> from openff.toolkit import ForceField, Molecule
>>> interchange = ForceField("openff-2.2.0.offxml").create_interchange(
...     Molecule.from_mapped_smiles("[H:1][O:2][H:3]").to_topology()
... )
>>> interchange['Bonds'].key_map[1, 2]
PotentialKey associated with handler 'Bonds' with id '[#8:1]-[#1:2]'
>>> interchange['Bonds'][1, 2]
Potential(parameters={'k': <Quantity(1076.70159, 'kilocalorie_per_mole / angstrom ** 2')>, 'length': <Quantity(0.975230101, 'angstrom')>}, map_key=None)
>>>

I had to change the hash implementations to not include None in the hash if all non-atom indices attributes were None so that the hash matches between the TopologyKey and the appropriate tuple. This might occasionally cause collisions... but it should be extremely rare and regardless Python's requirement is that __eq__ implies hashes are equal, not the converse.

Atom indices for bonds have to be in the correct order. Fixing this for indexing with tuples would require replacing Collection.key_map with a custom class. We could make the ergonomics easier by guaranteeing that atom indices in bonds are always ordered a certain way?

Checklist

Add tests
Lint
Update docstrings

…ndices

for more information, see https://pre-commit.ci

codecov · 2024-06-12T11:35:21Z

Codecov Report

Attention: Patch coverage is 96.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 91.28%. Comparing base (6a1fb9e) to head (ed12deb).

Additional details and impacted files

mattwthompson · 2024-06-14T17:01:21Z

We could make the ergonomics easier by guaranteeing that atom indices in bonds are always ordered a certain way?

Seems worth it to me - but I'm not sure about other valence terms

mattwthompson · 2024-06-14T20:46:40Z

Okay, I see you've gone about sorting the other terms. I think this breaks angles and dihedrals, though, since they're not symmetric like bonds?

Yoshanuikabundi · 2024-07-10T04:20:31Z

We could make the ergonomics easier by guaranteeing that atom indices in bonds are always ordered a certain way?

Seems worth it to me - but I'm not sure about other valence terms

We'd have to do it everywhere because this PR just indexes in directly. It might be better to just look up all equivalent index orderings?

Okay, I see you've gone about sorting the other terms.

Welp I didn't mean to, I'll add tests and double check

EDIT: Yeah could you point out where I've done this?

I think this breaks angles and dihedrals, though, since they're not symmetric like bonds?

Yeah I agree, you can reverse the order but I think that's the only symmetry op for both?

Yoshanuikabundi · 2024-07-10T05:35:29Z

OK I've just made it so that when indexing into a collection with a tuple, both the forward and reverse of the tuple will be looked up. I think this is a lot less intrusive than guaranteeing everywhere that the tuples are ordered. My understanding is that all valence terms can be indexed in either direction, but if this isn't the case then I can change it to only look up the reverse for 2-tuples.

mattwthompson · 2024-07-10T14:25:48Z

could you point out where I've done this?

Nope, I think I got confused looking too superficially at which tests you've added

mattwthompson · 2024-07-10T14:26:15Z

Am I meant to be able to look up other valence terms, or is this messed up by all of those extra fields (some of which might not be necessary)?

In [12]: interchange = ForceField("openff-2.2.0.offxml").create_interchange(Molecule.from_smiles("
    ...: CCO").to_topology())

In [13]: next(iter(interchange['ProperTorsions'].key_map))
Out[13]: ProperTorsionKey with atom indices (0, 1, 2, 8), mult 0

In [14]: interchange['ProperTorsions'].key_map[0, 1, 2, 8]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[14], line 1
----> 1 interchange['ProperTorsions'].key_map[0, 1, 2, 8]

KeyError: (0, 1, 2, 8)

Yoshanuikabundi · 2024-07-11T09:02:56Z

Yeah that's on purpose - since I implemented this by defining __eq__ on the TopologyKey subclasses, you have to include all the information in the tuple - otherwise you violate a == b implies hash(a) == hash(b) and run into issues if you had two entries with the same atom indices. I do this implicitly by only making tuples equal to a TopologyKey when the non-indices fields of the TopologyKey are the defaults.

I thought about allowing more information to be given in the tuple - for example, to index the above torsion you could do something like:

interchange['ProperTorsions'].key_map[(0, 1, 2, 8), 0]
# Or
interchange['ProperTorsions'].key_map[(0, 1, 2, 8), 0, None, None]

But since users would have to remember the correct ordering for everything and there's no way to disambiguate the name of the field you're specifying and all the fields have the same type I thought this provided a relatively small ergonomics boost. If users want to specify the extended fields, they can still index with a ProperTorsionsKey (or whatever), which provides that documentation (via autocomplete) and disambiguation (via named arguments). I chose the existing behavior as the best ratio of ergonomics improvement to amount of new code, but I made this judgement assuming the default values of all fields would cover the majority of use cases.

Here are some ways I've thought about dealing with this:

Leave things as they are
Implement equality between extended tuples as above (or some subset)
If mult=0 and mult=None are equivalent, change the default value of mult (and the value that __eq__ and __hash__ assume) to 0 (for bonus points, also change the type from int | None to int)
If mult=0 is more common than mult=None but a breaking change is not desired, change just the value that __eq__ and __hash__ assume to 0 (I think this would be more confusing than 3)
Implement a more detailed lookup in Collections.__getitem__ and if the given key is unambiguous wrt the existing keys, use it - the downside of this is that since we don't know the correct value ahead of time (or even the correct type) we'd actually have to iterate over all the existing keys to find possible matches, which changes worst case lookup from O(1) to O(n).

Note also that indexing into collection is different to indexing into collection.key_map, as I didn't want to change key_map away from being a simple dict:

In [10]: next(iter(interchange['Bonds'].key_map))
Out[10]: BondKey with atom indices (0, 1)

In [11]: interchange['Bonds'][0, 1] # Returns the underlying potential
Out[11]: Potential(parameters={'k': <Quantity(430.602781, 'kilocalorie_per_mole / angstrom ** 2')>, 'length': <Quantity(1.5336276, 'angstrom')>}, map_key=None)

In [12]: interchange['Bonds'][1, 0] # Works with reversed keys
Out[12]: Potential(parameters={'k': <Quantity(430.602781, 'kilocalorie_per_mole / angstrom ** 2')>, 'length': <Quantity(1.5336276, 'angstrom')>}, map_key=None)

In [13]: interchange['Bonds'].key_map[0, 1] # Returns the PotentialKey
Out[13]: PotentialKey associated with handler 'Bonds' with id '[#6X4:1]-[#6X4:2]'

In [14]: interchange['Bonds'].key_map[1, 0] # Doesn't work with reversed keys
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[14], line 1
----> 1 interchange['Bonds'].key_map[1, 0]

KeyError: (1, 0)

…omics

mattwthompson · 2024-07-11T20:24:32Z

openff/interchange/_tests/unit_tests/test_models.py

+    assert BondKey(atom_indices=(1, 3), bond_order=None) != ((1, 3), None)
+    assert BondKey(atom_indices=(1, 3), bond_order=1.5) != (1, 3)
+    assert BondKey(atom_indices=(1, 3), bond_order=1.5) != ((1, 3), None)
+    assert BondKey(atom_indices=(1, 3), bond_order=1.5) != ((1, 3), 1.5)


I'm pretty sure these two should be equal?

def __hash__(self) -> int: if self.bond_order is None: return hash(tuple(self.atom_indices)) return hash((tuple(self.atom_indices), self.bond_order))

I believe this test is correct - Lines 131, 133 and 134 are all extended tuples, which has not been implemented (as we've discussed elsewhere), and in line 133 the bond order is not None so the tuple doesn't match.

The second return line in that hash function is the existing behaviour - BondKey.__eq__ is

def __eq__(self, other) -> bool: return super().__eq__(other) or ( self.bond_order is None and other == self.atom_indices )

Ahhh I see what's going on, TopologyKey.__eq__ was changed to compare hash functions. I don't think this is generally technically correct as you might get hash collisions - Python uses a 64 bit hash, so there are 2^64 possible hashes, and a typical simulation system might have 2^16 atoms (~65k), so if you need four of them to define a torsion then you might have 2^64 possible atom index tuples... which is really filling up the pigeon holes. Granted torsions tend to be defined between nearby atoms but that's also assuming that Python's hash function is doing a good job of avoiding collisions, and it's had pathological cases in the past.

A better way might be to have a _tuple() function that gives the equivalent tuple, and then define __hash__ as return hash(self._tuple()) and __eq__ as something like return self._tuple() == other._tuple() or self._tuple() == other

Comparing hashes in __eq__ can also run into issues when comparing objects of different types.

…llection-ergonomics

mattwthompson

I think I'm past providing useful feedback on this changelog, which is long past due its green check. I want to point this to the develop branch which targets an upcoming 0.4.x line; one test confuses me but otherwise please merge this when you're happy with it (% any confusion I've added).

mattwthompson · 2024-07-11T20:36:27Z

I get why equality across extended tuples is tricky; given that the motivation of this behavior is making our lives easier, I'm fine leaving that door closed.

mult in general is ... not something I'm completely happy with but also something I'm not eager to change. BondKey basically representing a tuple with some extra (optional!) flavors feels right, but how periodic torsions are handled feels clunkier. Not to be dealt with here, but I think that's a prime candidate for a future refactor.

I looked through the docs to find a good place to show this off, but I didn't find anything. In the future that might be worth a subsection, either in the static docs or a notebook(s) somewhere. I expected there to be tons of these excessiely-difficult-lookup cases, but now I think those are just in tests and the source code. I don't see much of a reason to rewrite what's there, but with any luck I'll remember this trick and future code will be easier to write and a little bit more sightly to look at.

…ng more robust

Yoshanuikabundi · 2024-07-12T10:32:46Z

Ok well I implemented extended tuples as a way to keep the TopologyKey.__eq__ implementation simple and correct while ensuring it's consistent with TopologyKey.__hash__. I've added an example of indexing to the Interchange API docs and also to the part of the user guide that describes the inner workings of Collection.

There are still a few topology key classes that are not subclasses of TopologyKey because they correspond to a single atom - I think the best way forward there would just be to make the single atom index field a property that points to self.atom_indices[0], but I haven't implemented this.

If you're happy with this change set, I'm happy for you to merge it at your convenience!

Yoshanuikabundi and others added 4 commits June 12, 2024 20:40

Add lammps files created by tests to .gitignore

9382d44

Implement equality checking between TopologyKeys and tuples of atom i…

0489229

…ndices

Implement indexing on collections

bfb8acf

[pre-commit.ci] auto fixes from pre-commit.com hooks

d4db57c

for more information, see https://pre-commit.ci

Yoshanuikabundi added 2 commits July 10, 2024 15:15

Test equality of TopologyKeys with reversed indices

9e59df5

Indexing into a collection with a tuple looks up forward and reverse

569d9e4

mattwthompson changed the base branch from main to develop July 11, 2024 18:45

Merge remote-tracking branch 'upstream/develop' into collection-ergon…

44feb3f

…omics

mattwthompson reviewed Jul 11, 2024

View reviewed changes

mattwthompson added 2 commits July 11, 2024 15:26

TST: Update test comparing tuple representation

959d604

Merge remote-tracking branch 'upstream/collection-ergonomics' into co…

339f46b

…llection-ergonomics

mattwthompson approved these changes Jul 11, 2024

View reviewed changes

Yoshanuikabundi added 4 commits July 12, 2024 19:56

Add extended tuples and TopologyKeys to TopologyKey tests

4d3d81a

Implement extended tuple TopologyKey equality and make equality testi…

28d2c60

…ng more robust

Add examples to docstrings

c6c9013

Add docs for collection ergonomics

ed12deb

mattwthompson merged commit e06176c into develop Jul 15, 2024
21 checks passed

mattwthompson deleted the collection-ergonomics branch July 15, 2024 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collection ergonomics #990

Collection ergonomics #990

Yoshanuikabundi commented Jun 12, 2024 •

edited

Loading

codecov bot commented Jun 12, 2024 •

edited

Loading

mattwthompson commented Jun 14, 2024

mattwthompson commented Jun 14, 2024

Yoshanuikabundi commented Jul 10, 2024 •

edited

Loading

Yoshanuikabundi commented Jul 10, 2024

mattwthompson commented Jul 10, 2024

mattwthompson commented Jul 10, 2024

Yoshanuikabundi commented Jul 11, 2024 •

edited

Loading

mattwthompson Jul 11, 2024

Yoshanuikabundi Jul 12, 2024 •

edited

Loading

Yoshanuikabundi Jul 12, 2024

Yoshanuikabundi Jul 12, 2024

mattwthompson left a comment

mattwthompson commented Jul 11, 2024

Yoshanuikabundi commented Jul 12, 2024

Collection ergonomics #990

Collection ergonomics #990

Conversation

Yoshanuikabundi commented Jun 12, 2024 • edited Loading

Description

Checklist

codecov bot commented Jun 12, 2024 • edited Loading

Codecov Report

mattwthompson commented Jun 14, 2024

mattwthompson commented Jun 14, 2024

Yoshanuikabundi commented Jul 10, 2024 • edited Loading

Yoshanuikabundi commented Jul 10, 2024

mattwthompson commented Jul 10, 2024

mattwthompson commented Jul 10, 2024

Yoshanuikabundi commented Jul 11, 2024 • edited Loading

mattwthompson Jul 11, 2024

Choose a reason for hiding this comment

Yoshanuikabundi Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Yoshanuikabundi Jul 12, 2024

Choose a reason for hiding this comment

Yoshanuikabundi Jul 12, 2024

Choose a reason for hiding this comment

mattwthompson left a comment

Choose a reason for hiding this comment

mattwthompson commented Jul 11, 2024

Yoshanuikabundi commented Jul 12, 2024

Yoshanuikabundi commented Jun 12, 2024 •

edited

Loading

codecov bot commented Jun 12, 2024 •

edited

Loading

Yoshanuikabundi commented Jul 10, 2024 •

edited

Loading

Yoshanuikabundi commented Jul 11, 2024 •

edited

Loading

Yoshanuikabundi Jul 12, 2024 •

edited

Loading